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Abstract — This paper presents a practical writing/reading 
scheme in nonvolatile memories, called balanced modulation, for 
minimizing the asymmetric component of errors. The main idea 
is to encode data using a balanced error-correcting code. When 
reading information from a block, it adjusts the reading threshold 
such that the resulting word is also balanced or approximately 
balanced. Balanced modulation has suboptimal performance for 
any cell-level distribution and it can be easily implemented in 
the current systems of nonvolatile memories. Furthermore, we 
studied the construction of balanced error-correcting codes, in 
particular, balanced LDPC codes. It has very efficient encoding 
and decoding algorithms, and it is more efficient than prior 
construction of balanced error-correcting codes. 

Index Terms — Balanced Modulation, Balanced LDPC Codes, 
Dynamic Reading Thresholds. 



I. Introduction 

NONVOLATILE memories, like EPROM, EEPROM, 
Flash memory or Phase-change memory (PCM), are 
memories that can keep the data content even without power 
supply. This property enables them to be used in a wide range 
of applications, including cellphones, consumers, automotive 
and computers. Many research studies have been carried out 
on nonvolatile memories because of their unique features, 
attractive applications and huge marketing demands. 

An important challenge for most nonvolatile memories is 
data reliability. The stored data can be lost due to many 
mechanisms, including cell heterogeneity, programming noise, 
write disturbance, read disturbance, etc. Q, 1151 . From a long- 
term view, the change in data has an asymmetric property. 
For example, the stored data in flash memories is represented 
by the voltage levels of transistors, which drift in one di- 
rection because of charge leakage. In PCM, another class of 
nonvolatile memories, the stored data is determined by the 
electrical resistance of the cells, which drifts due to thermally 
activated crystallization of the amorphous material |2T1 . All 
these mechanisms make the errors in nonvolatile memories be 
heterogeneous, asymmetric, time dependent and unpredictable. 
These properties bring substantial difficulties to researchers 
attempting to develop simple and efficient error-correcting 
schemes. 

To date, existing coding schemes for nonvolatile memories 
commonly use fixed thresholds to read data. For instance, in 
flash memories, a threshold voltage level v is predetermined; 
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Fig. 1. An illustration of the voltage distributions for bit "1" and bit "0" in 
flash memories. 



when reading data from a cell, it gets '1' if the voltage 
level is higher than v, and otherwise it gets '0'. To increase 
data reliability, error-correcting codes such as Hamming code, 
BCH code, Reed-Solomon code and LDPC code are applied 
in nonvolatile memories to combat errors. Because of the 
asymmetric feature of nonvolatile memories, a fixed threshold 
usually introduces too many asymmetric errors after a long 
duration [14], namely, the number of 1 — >• errors is usually 
much larger than the number of — > 1 errors. To overcome the 
limitations of fixed thresholds in reading data in nonvolatile 
memories, dynamic thresholds are introduced in this paper. To 
better understand this, we use flash memories for illustration, 
see Fig.Q] The top figure is for newly written data, and the bot- 
tom figure is for old data that has been stored for a long time 
T. In the figures, assume the left curve indicates the voltage 
distribution for bit '0' (a bit '0' is written during programming) 
and the right curve indicates the voltage distribution for bit ' 1' . 
At time (the moment after programming), it is best to set 
the threshold voltage as v = v%, for separating bit '1' and 
'0'. But after a period of time, the voltage distribution will 
change. In this case, v\ is no longer the best choice, since it 
will introduce too many 1 — > errors. Instead, we can set 
the threshold voltage as v = V2 (see the second plot in the 
figure), to minimize the error probability. This also applies to 
other nonvolatile memories, such as PCMs. 

Although best dynamic reading thresholds lead to much less 
errors than fixed ones, certain difficulties exist in determining 
their values at a time t. One reason is that the accurate level 
distributions for bit '1' and '0' at any the current time are hard 
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to obtain due to the lack of time records, the heterogeneity 
of blocks, and the unpredictability of exceptions. Another 
possible method is to classify all the cell levels into two groups 
based on unsupervised clustering and then map them into 
'l's and 'O's. But when the border between bit Ts and 'O's 
becomes fuzzy, mistakes of clustering may cause significant 
number of reading errors. In view of these considerations, in 
this paper, we introduce a simple and practical writing/reading 
scheme in nonvolatile memories, called balanced modulation, 
which is based on the construction of balanced codes (or 
balanced error-correcting codes) and it aims to minimize the 
asymmetric component of errors in the current block. 

Balanced codes, whose codewords have an equal number 
of Is and Os, have been studied in several literatures. Knuth, 
in 1986, proposed a simple method of constructing balanced 
codes iflOl . In his method, given an information word of 
fc-bits (k is even), the encoder inverts the first i bits such 
that the modified word has an equal number of Is and Os. 
Knuth showed that such an integer i always exists, and it is 
represented by a balanced word of length p. Then a codeword 
consists of an p-bit prefix word and an fc-bit modified infor- 
mation word. For decoding, the decoder can easily retrieve 
the value of i and then get the original information word 
by inverting the first i bits of the fc-bit information word 
again. Knuth's method was later improved or modified by 
many researchers [1|, [9|, |17|, |19|. Based on balanced codes, 
we have a scheme of balanced modulation. It encodes the 
stored data as balanced codewords; when reading data from a 
block, it adjusts the reading threshold dynamically such that 
the resulting word to read is also balanced (namely, the number 
of Is is equal to the number of Os) or approximately balanced. 
Here, we call this dynamic reading threshold as a balancing 
threshold. 

There are several benefits of applying balanced modulation 
in nonvolatile memories. First, it increases the safety gap of Is 
and Os. With a fixed threshold, the safety gap is determined by 
the minimum difference between cell levels and the threshold. 
With balanced modulation, the safety gap is the minimum 
difference between cell levels for 1 and those for 0. Since 
the cell level for an individual cell has a random distribution 
due to the cell-programming noise J5), ifTTl . the actual value of 
the charge level varies from one write to another. In this case, 
balanced modulation is more robust than the commonly used 
fixed-threshold approach in combating programming noise. 
Second, as we discussed, balanced modulation can is a very 
simple solution that minimizes the influence of cell-level drift. 
It was shown in [|4] that cell-level drift in flash memories 
introduces the most dominating errors. Third, balanced mod- 
ulation can efficiently reduce errors introduced by some other 
mechanisms, such as the change of external temperatures and 
the current leakage of other reading lines, which result in the 
shift of cell levels in a same direction. Generally, balanced 
modulation is a simple approach that minimizes the influence 
of noise asymmetries, and it can be easily implemented 
on current memory devices without hardware changes. The 
balanced condition on codewords enables us to select a much 
better threshold dynamically than the commonly used fixed 
threshold when reading data from a block. 



The main contributions of the paper are 

1) We study balanced modulation as a simple, practical and 
efficient approach to minimize asymmetric component 
of errors in nonvolatile memories. 

2) A new construction of balanced error-correcting codes, 
called balanced LDPC code, is introduced and analyzed, 
which has a higher rate than prior constructions. 

3) We investigate partial-balanced modulation, for its sim- 
plicity of constructing error-correcting codes, and then 
we extend our discussions from binary cells to multi- 
level cells. 

II. Scope of This Paper 

A. Performance and Implementation 

In the first part of this paper, including Section [III] Section 
ITVl and Section [V] we focus on the introduction and perfor- 
mance of balanced modulation. In particular, we demonstrate 
that balanced modulation introduces much less errors than the 
traditional approach based on fixed thresholds. For any cell- 
level distributions, the balancing threshold used in balanced 
modulation is suboptimal among all the possible reading 
thresholds, in the term of total number of errors. It enables 
balanced modulation to be adaptive to a variety of channels 
characters, hence, it makes balanced modulation applicable for 
most types of nonvolatile memories. Beyond storage systems, 
balanced modulation can also be used in optimal communica- 
tion, where the strength of received signals shifts due to many 
factors like the transmitting distance, temperature, etc. 

A practical and very attractive aspect of balanced modula- 
tion is that it can be easily implemented in the current systems 
of nonvolatile memories. The only change is that, instead of 
using a fixed threshold in reading a binary vector, it allows 
this threshold to be adaptive. Fortunately, this operation can 
be implemented physically, making the process of data reading 
reasonably fast. In this case, the reading process is based on 
hard decision. 

If we care less about reading speed, we can have soft- 
decision decoding, namely, reading data without using a 
threshold. We demonstrate that the prior knowledge that the 
stored codeword is balanced is very useful. It helps us to better 
estimate the current cell-level distributions, hence, resulting in 
a better performance in bit error rate. 

B. Balanced LDPC Code 

Balanced modulation can efficiently reduce bit error rate 
when reading data from a block. A further question is how 
to construct balanced codes that are capable of correcting 
errors. We call such codes balanced error-correcting codes. 
Knuth's method cannot correct errors. In lfl8l . van Tilborg and 
Blaum presented a family of balanced binary error-correcting 
codes. The idea is to consider balanced blocks as symbols 
over an alphabet and to construct error-correcting codes over 
that alphabet by concatenating n blocks of length 21 each. 
Due to the constraint in the code construction, this method 
achieves only moderate rates. Error-correcting balanced codes 
with higher rates were presented by Al-Bassam and Bose in 
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Fig. 2. The diagram of balanced modulation. 

0], however, their construction considers only the case that the 
number of errors is at most 4. In |12|, Mazumdar, Roth, and 
Vontobel studied linear balancing sets, namely, balancing sets 
that are linear subspaces F", which are applied in obtaining 
coding schemes that combine balancing and error correction. 
Recently, Weber, Immink and Ferreira extent Knuth's method 
to let it equipped with error-correcting capabilities ||20l . Their 
idea is to assign different error protection levels to the prefix 
and modified information word in Knuth's construction. So 
their construction is a concatenation of two error-correct codes 
with different error correcting capabilities. In Section [VlJ 
we introduce a new construction of balanced error-correcting 
codes, which is based on LDPC code, so called balanced 
LDPC code. Such a construction has a simple encoding algo- 
rithm and its decoding complexity based on message-passing 
algorithm is asymptotically equal to the decoding complexity 
of the original (unbalanced) LDPC code. We demonstrate that 
balanced LDPC code has error-correcting capability very close 
to the original (unbalanced) LDPC code. 

C. Partial-Balanced Modulation and Its Extension 

Our observation is that the task of constructing efficient bal- 
anced error-correcting codes with simple encoding and decod- 
ing algorithms is not simple, but it is much easier to construct 
error-correcting codes that are partially balanced, namely, 
only a certain segment (or subsequence) of each codeword is 
balanced. Motivated by this observation, we propose a variant 
of balanced modulation, called partial-balanced modulation. 
When reading from a block, it adjusts the reading threshold 
such that the segment of the resulting word is balanced. Partial- 
balanced modulation has a performance very close to that of 
balanced modulation, and it has much simpler constructions 
of error-correcting codes than balanced modulation. Another 
question that we address in the third part is how to extend the 
scheme of balanced modulation or partial -balanced modulation 
to be used in nonvolatile memories with multi-level cells. 
Details will be provided in Section IVHI and Section I VIII I 

III. Balanced Modulation 

For convenience, we consider different types of nonvolatile 
memories in the same framework where data is represented by 



cell levels, such as voltages in flash memories and resistance in 
phase-change memories. The scheme of balanced modulation 
is sketched in Fig. |2] It can be divided into two steps: 
programming step and reading step. 

(1) In the programming step, we encode data based a 
balanced (error-correcting) code. Let k denote the dimension 
of the code and n denote the number of cells in a block, then 
given a message u 6 {0, 1}™, it is mapped to a balanced 
codeword x € {0, 1}™ such that |x| = ^ where |x| is the 
Hamming weight of x. 

(2) In the reading step, we let c = c\ci...c n £ K n be the 
current levels of the n cells to read. A balancing threshold v 
is determined based on c such that the resulting word, denoted 
by y = y\V2---Vn, is also balanced, namely, |y| = t-. For each 
i G {1, 2, n}, yi = 1 if and only if Ci > v, otherwise yi = 
0. By applying the decoder of the balanced (error-correcting) 
code, we get a binary output u, which is the message that we 
read from the block. 
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Fig. 3. Cell-level distributions for 1 and 0, and the reading threshold. 

Let us intuitively understanding the function of balanced 
modulation based on the demonstration of Fig. [3] which 
depicts the cell-level distributions for those cells that store 
or 1. Given a reading threshold v, we use A*- 1-5 " ' denote the 
number of 1 — >• errors and use A^ ^ 1 ) denote the number 
of — > 1 errors, as the tails marked in the figure. Then 

N^Q = \{i:x i = l,y i = 0}|, 

= |{t:x < =0 > tf i = l}|. 

We are ready to see 

|y| = Ixl-A^+A^ 1 ), 

where |x| is the Hamming weight of x. 

According to the definition, a balancing threshold is the one 
that makes y being balanced, hence, 

iV^°)(v)=iV^ 1 )(v), 

i.e., a balancing threshold results in the same number of 1 — >• 
errors and — > 1 errors. 

We define A e (v) as the total number of errors based on a 
reading threshold v, then 



>(v). 



If the cell-level distributions for those cells that store 1 and 
those cells that store are known, then the balancing threshold 
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may not be the best reading threshold that we can have, i.e., 
N e (v) may not be minimized based on the balancing thresh- 
old. Let v b denote the balancing threshold, as a comparison, 
we can have an optimal threshold v a , which is defined by 

v Q = argmin AT e (v). 

V 

Unfortunately, it is almost impossible for us to know the 
cell-level distributions for those cells that store 1 and those 
cells that store without knowing the original word x. From 
this sense, the optimal threshold v is imaginary. Although 
we are not able to determine v , the following result shows 
that the balancing threshold v b has performance comparable 
to that of v Q . Even in the worst case, the number of errors 
introduced based on v b is at most two times that introduced 
by v a , implying the suboptimality of the balancing threshold 
v b . 

Theorem 1. Given any balanced codeword x S {0, 1}™ and 
cell-level vector c £ lZ n , we have 

N e (v b ) < 2N e (v ). 

Proof: Given the balancing threshold v b , the number of 
— > 1 errors equals the number of 1 — > errors, hence, the 
total number of errors is 

N e (v b ) = 2N^°\v b ) = 2N^(v b ). 

If v > Vb, the number of 1 — >• errors iV^ 1- *' ) (v ) > 
N^°\v b ). Therefore, 

N e (v b ) < 2A^°)( Vo ) < 2N e { Vo ). 

Similarly, if v a < v b , by considering only — > 1 errors, we 
get the same conclusion. ■ 

Now we compare the balancing threshold v b with a fixed 
threshold, denoted by Vf. As shown in Fig. [3] if we set the 
reading threshold as fixed Vf — i, then it will introduce 
much more errors then the balancing threshold. Given a fixed 
threshold Vf, after a long duration, we can characterize the 
storage channel as a binary asymmetric channel, as shown in 
Fig. SJa), where p\ > p2- Balanced modulation is actually a 
process of modifying the channel to make it being symmetric. 
As a result, balanced modulation results in a binary symmetric 
channel with crossover probability p such that p2 < p < p\. 
When p2 <C pi, it has p — p2 -C Pi — P- In this case, the bit 
error rate is reduced from 2i±fe ( ^ where p <C Pl + P2 . 




Fig. 4. Balanced modulation to turn a binary asymmetric channel with 
crossover probabilities pi > p2 into a binary symmetric channel with p2 < 
V < Pi- 



IV. Bit-Error-Rate Analysis 

To better understand different types of reading thresholds 
as well as their performances, we study them from the ex- 
pectation (statistical) perspective. Assume that we write n bits 
(including k ones) into a block at time 0, let gt(v) denote the 
probability density function (p.d.f.) of the cell level at time t 
that stores a bit 0, and let h t (v) denote the p.d.f. of the cell 
level at time t that stores 1. Then at time t, the bit error rate 
of the block based on a reading threshold v is given by 

i f°° i r 

Pe(v) = - J g t {u)du+ - J h t (v)dv. 

According to our definition, a balancing threshold v b is 
chosen such that N^ 1 ^ (v b ) = N (0 ^°°\v b ), i.e., the number 
of 1 — > errors is equal to the number of — > 1 errors. As the 
block length n becomes sufficiently large, we can approximate 
N^°\v b ) as f h t {v)dv and approximate N^°°\v b ) 
as § J g t (u)du. So when n is large, we approximately have 

I g t (u)du = I h t (v)dv. 

J Vb J — oo 

Differently, an optimal reading threshold v is the one that 
minimizes the total number of errors. When n is large, we 
approximately have 

v = argmin p e (v). 

V 

When gt{v) and ht(v) are continuous functions, the solutions 
of v are 

v Q = ±oo or g t (v ) = h t (v ). 

That means v is one of the intersections of gt(v) and ht(v) 
or one of the infinity points. 

Generally, gt(v) and ht(v) are various for different non- 
volatile memories and different blocks, and they have different 
dynamics over time. It is not easy to find a perfect model to 
characterize gt(v) and ht(v), but there are two trends about 
them in timescale. The change of a cell level can be treated as a 
superposition of these two trends. First, due to cell-level drift, 
the difference between the means of gt{v) and h t (v) becomes 
smaller. Second, due to the existence of different types of noise 
and disturbance, their variances increases over time. To study 
the performance of balanced modulation, we consider both of 
the effects separately in some simple scenarios. 

Example 1. Let g t (v) = M(0,a) and h t (v) = Af(l - t,a), 
as Ulustrated in Fig. |5] We assume that the fixed threshold is 
Vf = i, which satisfies go( v f) — h^Vf). 

In the above example, the cell-level distribution correspond- 
ing to bit '1' drifts but its variance does not change. We have 

1 -t 1 

v b =v Q = ^- , v f = -. 

At time t, the bit error rate based on a reading threshold v 

is 

p e (v) = ) + -$( ), 

Z o~ Z a 
where <£>(x) = — h= \ x e~ t2 l 2 dt. 

v ' V27T J— oo 
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Fig. 5. An illustration of the first model with gt{v) = A/"(0, o") and ht{v) 
M(l - t,a). 
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Fig. 6. Bit error rates as functions of time t, under the first model with 
g t (v) = Af(0, a) and h t (v) = Af(l - t, a). 



In this example, the variance of the cell-level distribution 
corresponding to bit '1' increases as the time t increases. We 
have 



1 



e = 



1 



cj + t ' 2 + t/a 

At time t, the bit error rate based on a threshold v is 



1 



p e (y) = -$(—) + 



1 



1 



2 v a' 2 " a + t 
which is plotted in Fig. [8] for different thresholds. It shows 
that balancing thresholds introduce much less errors than 
fixed thresholds when bit '1' and '0' have different reliability 
(reflected by their variances), although they introduce slightly 
more errors than optimal thresholds. 



o=0.15 
- O-0.05 




For different selections of reading thresholds, p e (v) is 
plotted in Fig. [6] It shows that the balancing threshold and 
the optimal threshold have the same performance, which is 
much better than the performance of a fixed threshold. When 
cell levels drift, balanced modulation can significantly reduce 
the bit error rate of a block. 

Example 2. Let g t (v) = J\f(0,a) and h t (v) = Af(l,a + t), 
as illustrated in Fig. We assume that the fixed threshold is 
Vf — i, which satisfies go(vf) = ho(vf). 



o 1 




cell-level 



Fig. 7. An illustration of the second model with gt(v) = Af(0,a) and 
ht(v) =M(l,<r + t). 



Fig. 8. Bit error rates as functions of time t, under the second model with 
gt(v) = Af(0, a) and h t (v) = Af{l, a + t). 

In practice, the cell-level distributions at a time t are much 
more complex than the simple Gaussian distributions, and 
the errors introduced are due to many complex mechanisms. 
However, the above analysis based two simple models are still 
useful, because they reflect the trends of the cell level changes, 
which is helpful for analyzing the time-dependent errors in 
nonvolatile memories. 

V. Implementation 

Balanced modulation can be easily implemented on the 
current architecture of nonvolatile memories. The process 
described in the previous sections can be treated as a hard 
decision approach, where a reading threshold is selected to 
separate all the cell levels as zeros and ones. In this section, 
we discuss a few methods of determining balancing thresholds 
quickly, as well as their implementations in nonvolatile mem- 
ories. Furthermore, we discuss soft decision implementation 
of balanced modulation, namely, we do not read data based 
on a reading threshold, and the decoder can get access into 
all the cell levels (cell-level vector c) directly. In this case, 
we want to know how the prior information that the stored 
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codeword is balanced can help us to increase the success rate 
of decoding. 

A. Balancing Threshold for Hard Decision 

Given a block of n cells, assume their current levels are c = 
c\C2---c n . Our problem is to determine a threshold vj, such that 
there are ^ cells or approximately ^ cells will be read as ones. 
A trivial method is to sort all the n cell levels in the decreasing 
order such that > Cj 2 > ... > Ci n . Then Vb = C ' fc 2' k+1 
is our desired balancing threshold. The disadvantage of this 
method is that it needs 0(n\ogn) computational time, which 
may slow down the reading speed when n is large. To reduce 
the reading time, we hope that the balancing threshold can be 
controlled by hardware. 

Half-interval search is a simple approach of determining the 
balancing threshold. Assume it is known that Vb is S [hjh] 
with li < 12- First, we set the reading threshold as ll \ l2 , based 
on which a simple circuit can quickly detect the number of 
ones in the resulting word, denoted by k. If k < we reset 
the interval [Zi , Z 2 ] as [1%, ' 1 ^' 2 ]. If k > 5, we reset the interval 
[h,h] as j^ 2 , 12]. Then we repeat this procedure until we 
get a reading threshold such that k = ^ ot I2 — h < e for a 
reading precision e. 

B. Relaxed Balancing Threshold 

Half-interval search is an iterative approach of determining 
the balancing threshold such that the resulting word is well 
balanced. To further reduce the reading time, we can relax the 
constraint on the weight of the resulting word, namely, we can 
let the number of ones in the resulting word be approximately 
instead of accurately ^. 

For instance, we can simply set the balancing threshold as 

Vb = — — - — = mean(c). 

n 

Obviously, such Vb reflects the cell-level drift and it can be 
easily implemented by a simple circuit. 

More precisely, we can treat mean(c) as the first-order 
approximation, in this way, we write Vb as 

Vb = mean(c) + a(- — mean(c)) 2 , 

where a is a constant depending on the noise model of memory 
devices. 

C. Prior Probability for Soft Decision 

Reading data based on hard decision is preferred in non- 
volatile memories, regarding to its advantages in reading 
speed and computational complexity compared to soft decision 
decoding. However, in some occasions, soft decision decoding 
is still useful for increasing the decoding success rate. We 
demonstrate that the prior knowledge that the stored code- 
words are balanced can help us to better estimate the cell-level 
probability distributions for or 1. Hence, it leads to a better 
soft decoding performance. 

We assume that given a stored bit, either or 1, its cell 
level is Gaussian distributed. (We may also use some other 



distribution models according to the physical properties of 
memory devices, and our goal is to have a better estimation 
of model parameters). Specifically, we assume that the cell- 
level probability distribution for is Af(uo, o-q) and the cell- 
level probability distribution for 1 is J\f(ux,cri). Since the 
codewords are balanced, the probability for a cell being 
or 1 is equal. So we can describe cell levels by a Gaussian 
Mixture Model. Our goal is to find the maximum likelihood 
wo7 0o,ui,7i based on the cell-level vector c, namely, the 
parameters that maximize 

P(c|M ,cr ,'Ui,e r i). 

Expectation-Maximization (EM) algorithm is an itera- 
tive method that can easily find the maximum likelihood 
Uo, <To, ui , 71. The EM iteration alternates between performing 
an expectation (E) step and a maximization (M) step. Let 
x = xiX2---x n be the codeword stored in the current block, 
and let At = [uo(t), ao(t), u±(t), 71 (t)] be the estimation of 
the parameters in the tth iteration. In the E-step, it computes 
the probability for each cell being or 1 based on the current 
estimation of the parameters, namely, for all i <E {1, 2, n}, 
it computes 



P(x l = k\c h X t ) = 



(<=i-"fc(t)r 

2<7 fc (t) 2 



V — 



70' 



2T fc (t) 



■ 

— 



In the M-step, it computes parameters maximizing the like- 
lihood with given the probabilities obtained in the E-step. 
Specifically, for k G {0, 1}, 



«k(t + l) 



EiLi p ( x i = fc l c *A) 

P i X i = fc l c »> X t){c t ~ U k (t + l)) 2 



12i=i P( x i — k\ci, X t ) 

These estimations of parameters are then used to determine 
the distribution of Xi in the next E-step. 

Assume uq, o-q, ui, o\ are the maximum-likelihood param- 
eters, based on which we can calculate the log-likelihood for 
each variable Xi, that is 



logf( Ci \xi = 0) lo S^- 



2ri 



log /(Cj I = 1) log— - 



2^ 



where / is the probability density function. Based on the log- 
likelihood of each variable xi, some soft decoding algorithms 
can be applied to read data, including message-passing algo- 
rithms lfT3l . linear programming [6|, etc. It will be further 
discussed in the next section for decoding balanced LDPC 
code. 

VI. Balanced LDPC Code 

Balanced modulation can significantly reduce the bit error 
rate of a block in nonvolatile memories, but error correction 
is still necessary. So we study the construction of balanced 
error-correcting codes. In the programming step, we encode 
the information based on a balanced error-correcting code and 
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write it into a block. In the reading step, the reading threshold 
is adjusted such that it yields a balanced word, but probably 
erroneous. Then we pass this word to the decoder to further 
retrieve the original information. 

A. Construction 

In this section, we introduce a simple construction of bal- 
anced error-correcting codes, which is based on LDPC codes, 
called balanced LDPC code. LDPC codes, first introduced by 
Gallager [7] in 1962 and rediscovered in 1990s, achieve near 
Shannon-bound performances and allow reasonable decoding 
complexities. Our construction of balanced LDPC code is 
obtained by inverting the first i bits of each codeword in 
a LDPC code such that the codeword is balanced, where i 
is different for different codewords. It is based on Knuth's 
observation iflOl , that is, given an arbitrary binary word of 
length k with k even, one can always find an integer i with 
< i < k such that by inverting the first i bits the word 
becomes balanced. Different from the current construction in 
[20 1, where i is stored and protected by a lower-rate balanced 
error-correcting codes (the misdecoding of i may lead to 
catastrophic error propagation in the information word), we 
do not store i in our construction. The main idea is that 
certain redundancy exists in the codewords of LDPC codes 
that enables us to locate i or at last find a small set that 
includes i with a very high probability, even some errors 
exist in the codewords. It is wasteful to store the value of i 
with a lower-rate balanced error-correcting code. As a result, 
our construction is more efficient than the recent construction 
proposed in ll20l . 
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Fig. 9. Encoding of balanced LDPC codes. 

Let u be the message to encode and its length is k, according 
to the description above, the encoding procedure consists of 
two steps, as shown in Fig. [9] 

1) Apply an (n, k) LDPC code C to encode the message u 
into a codeword of length n, denoted by z = Gu, where 
G is the generator matrix of C. 

2) Find the minimal integer i in {0, 1, n — 1} such that 
inverting the first i bits of z results in a balanced word 

x = z + ro"-\ 

where i 4 o™~ 4 denotes a run of i bits 1 and n — i bits 0. 
Then we denote x as </>(z). This word x is a codeword 
of the resulting balanced LDPC code, denoted by C. 
We see that a balanced LDPC code is constructed by simply 
balancing the codewords of a LDPC code, which is called the 
original LDPC code. Based on the procedure above we can 
encode any message u of length k into a balanced codeword x 
of length n. The encoding procedure is very simple, but how 
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Fig. 10. Demonstration for the decoding of balanced LDPC codes. 



to decode a received word? Now, we focus on the decoding of 
this balanced LDPC code. Let y be an erroneous word received 
by the decoder, then the output of the maximum likelihood 
decoder is 

x = arg min D(y, x), 

where D(y,x) is the distance between y and x depending 
on the channel, for instance, Hamming distance for binary 
symmetric channels. 

The balanced code C is not a linear code, so the constraint 
x e C is not easy to deal with. A simpler way is to think 
about the codeword z S C that corresponds to x. By inverting 
the first j bits of y with < j < n, we can get a set of words 
S y of size n, namely, 



s, = {y (0) ,y (1 \...,y (n - 1) } ) 



in which 



for all j G 
{0,l,2,...,n- 



y (j) = y - 

{0,l,2,...,n}. 
1} such that 

y (i) - z 



Then there exists an 



x. 



The output of the maximum likelihood decoder is 



(z, i) = arg min 

z'e£,i'e{o,i,2.. 



V r 



subject to i' is the minimum integer that makes z' 
being balanced. 

If we ignore the constraint that i has to be the minimum 
integer, then the output of the decoder is the codeword in C 
that has the minimum distance to S y . Fig.[lO]provides a simple 
demonstration, where the solid circles are for the codewords 
of the LPDC code C, the triangles are for the words in S y that 
are connected by lines. Our goal is to find the solid circle that 
is the closest one to the set of triangles. It is different from 
traditional decoding of linear codes whose goal is to find the 
closest codeword to a single point. 
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B. An Extreme Case 

LDPC codes achieve near Shannon bound performances. A 
natural question is whether balanced LDPC codes hold this 
property. Certain difficulties exist in proving it by following 
the method in |8| (section 2 and section 3), since balanced 
LDPC codes are not linear codes and the distance distribu- 
tions of balanced LDPC codes are not easy to characterize. 
Fortunately, this statement looks correct because if the first i 
bits of a codeword have been inverted (we assume that the 
interger i is unknown), then the codeword can be recovered 
with only little cost, i.e., a very small number of additional 
redundant bits. 

Let us consider the ensemble of an (n, a, b) parity-check 
matrix given by Gallager [8 1, which has a ones in each column, 
b ones in each row, and zeros elsewhere. According to this 
construction, the matrix is divided into a submatrices, each 
containing a single 1 in each column. All the submatrices are 
random column permutations of a matrix that has a single one 
in each column and b ones in each row. As a result, we have 
(n, a, b) LDPC codes. 

Theorem 2. Given a codeword z of an (77, a, b) LDPC code, 
we get 

x = z + ro™-* 

by inverting the first i bits of z with < i < n. Let P e (x) 
be the error probability that z cannot be correctly recovered 
from x if i is unknown. As n — > oo, 

P e (x) ->0, 

for any integers a and b. 

Proof: Let H be the parity-check matrix of the LDPC 
code, and let 

y (j) =X+1 J 0"- J , 

for all j £ {0, 1,2,..., ra- 1}. 

We can recover z from x if and only if 

Hy® ± 0, 

for all j i and < j < n — 1. 
Hence, 



P e (x) =P(3j^i,s.t.,HyW = 0) 

<J2 p ( H y Q) =°)- 

Let us first consider the case of j > i. We have Hy^' = 
if and only if 

H(y^ + z) = 0, 



where 



y (j) + z = l l^ 4 0"-' 



So Hy"> = is equivalent to 

H{0 l V- l n -J) = 0. 



As we described, H is constructed by a submatrices, 
namely, we can write H as 



H 



( H, \ 

H 2 



V Ha J 



Let H s be one of the a submatrices of H, then H contains 
a single one in each columns and b ones in each row. And it 
satisfies 

//.:()' 1 ()'* : = 0, 

i.e., in each row of H s , there are even number of ones from 
the i + 1th column to the jth column. 

According to the construction of (n, a, b) LDPC codes, 

P{H s {0 l V- l n - ] ) = 0) = P{H s {V- l O n ~ 1+l ) = 0). 

So we can use P(n,j-i) to denote P(H s (0 l V- l n -J) = 0). 
First, we consider the case that b is even. In this case, 

P(n,j -i) = P(n,n-j + i). 

Hence, without loss of generality, we can assume that j — i = 
d<f. 

It is easy to see that P(n,j — i) > only if d is even. 
Assume that the one in the first column of H s is in the tth 
row, and let u be the number of ones in the tth row from the 
first j — i columns. Then we can get 



P(n,d)= Yl 

u=2,4,... 



,(— r 



) b - u P{n-b,d-u), 



71-1 



where P(n, d) = 1 if n = d or d = 0. 
If d< log n, then P[n,d) = 0{^). 
then 



If log 7i < d< f, 
b 



E 



U — 11 71 — 1 



w=2,4,... 

Iteratively, we can prove that 

P(n,d)=0((- 



77- 1 



b—u 



< 



1. logn 
-) 26 )■ 



Similar as above, when j < i, we can get 
F(#yCi) = 0) < P(n,i-j). 
Finally, we have 



, log 77 . 



^P{n,s) = 0{- r 

8=1 



So if b is even, as n — > oo, P e (x) — > 0. 

If b is odd, in each row, there exists at least one 1 in the last 
?7 — j + i elements. As a result, 77 — j + i > ^. Using a same 
idea as above, we can also prove that as 77 — > 00, P e (x) — > 0. 

So the statement in the theorem is true for any rate R = 
< 1. This completes the proof. ■ 
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The above theorem considers an extreme case that if the 
codeword of a balanced LDPC code does not have errors, 
then we can recover the original message with little cost 
of redundancy. It implies that balanced LDPC codes may 
achieve almost the same rates as the original unbalanced LDPC 
codes. In the following subsections, we discuss some decoding 
techniques for binary erasure channels and binary symmetric 
channels. Simulation results on these channels support the 
above statement. 

C. Decoding for Erasure Channels 

In this subsection, we consider binary erasure channels 
(BEC), where a bit (0 or 1) is either successfully received 
or it is deleted, denoted by "?". Let y g {0, 1, ?}" be a word 
received by a decoder after transmitting a codeword x e C 
over a BEC. Then the key of decoding y is to determine the 
value of the integer i such that x can be obtained by inverting 
the first i bits of a codeword in C. 

A simple idea is to search all the possible values of i, i.e., we 
decode all the possible words y(°\ ...,y'- n_1 ^ separately 
and select the best resulting codeword that satisfies all the 
constraints as the final output. This idea is straightforward, 
but the computational complexity of the decoding increases 
by a factor of n, which is not acceptable for most practical 
applications. 

Our observation is that we might be able to determine the 
value of i or at least find a feasible set that includes i, based 
on the unerased bits in y. For example, given x £ C, assume 
that one parity-check constraint is 

X{ 2 ~\~ ••• *£?4 0. 

If all y il ,y i2 , ...,y i4 are observed (not erased), then we can 
have the following statement about i: 

(1) If y n + y l2 + ... + y u = 0, then 

i E [0,i 1 )[j[i2,i3)[j[i4,n}. 

(2) If y n + y l2 + ... + y h = 1, then 

i € [ii,i2) [J[i3,u)- 

By combining this observation with the message-passing 
algorithm, we get a decoding algorithm for balanced LDPC 
codes under BEC. Similar as the original LDPC code, we 
present a balanced LDPC code as a sparse bipartite graph with 
n variable nodes and r check nodes, as shown in Fig. QT| 
Additionally, we add an inversion node for representing the 
value or the feasible set of i. Let us describe a modified 
message-passing algorithm on this graph. In each round of 
the algorithm, messages are passed from variable nodes and 
inversion nodes to check nodes, and then from check nodes 
back to variable nodes and inversion nodes. 

We use X denote the feasible set consisting of all possible 
values for the integer i, called inversion set. At the first round, 
we initialize the jth variable node yj G {0, 1, ?} and initialize 
the inversion set as X = [0, n]. Then we pass message and 
update the graph iteratively. In each round, we do the following 
operations. 



9 variable node 

l 




Fig. 11. Graph for balanced LDPC codes. 

(1) For each variable node v, if its value x v is in {0, 1}, it 
sends x v to all its check neighbors. If x v =? and any incoming 
message u is or 1, it updates x v as u and sends u to all its 
check neighbors. If x v =? and all the incoming messages are 
?, it sends ? to all its check neighbors. 

(2) For each check node c, assume the messages from its 
variable neighbors are Xi x ,Xi 2 , Xi b , where ii, *2j *6 are 
the indices of these variable nodes s.t. i\ < 12 < ... < i&. 
Then we define 

S° c = [0,h)\J[i2,i3)\J..., 

Si = [h,i2)[J[i3,h)U-- 

If all the incoming messages are in {0, 1}, then we update X 
in the following way: If x^ + x. L , 2 + ... + x. Lb = 0, we update 
X as XP| S®', otherwise, we update X as X f] . In this case, 
this check node c is no longer useful, so we can remove this 
check node from the graph. 

(3) For each check node c, if there are exactly one incoming 
message from its variable neighbor which is Xj =? and all 
other incoming messages are in {0, 1}, we check whether X C 
S° or X C Si. If X C S°, then the check node sends the XOR 
of the other incoming messages except ? to Xj. If X C S^, then 
the check node sends the XOR of the other incoming messages 
except ? plus one to Xj . In this case, the check node c is also 
no longer useful, so we can remove this check node from the 
graph. 

The procedure above continues until all erasures are filled 
in, or no erasures are filled in the current iteration. Differ- 
ent from the message-passing decoding algorithm for LDPC 
codes, where in each iteration both variable nodes and check 
nodes are processed only once, here, we process variable nodes 
once but check nodes twice in each iteration. If all erasures 
are filled in, x is the binary vector labeled on the variable 
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Fig. 12. The average size of the inversion set X after iterations in the 
message-passing algorithm for decoding balanced LDPC codes. 



Fig. 13. Word error rate of balanced LDPC codes and unbalanced LDPC 
codes when the erasure probability p = 0.35. 



nodes. In this case, if \L\ = 1, then i is the only element in 
I, and we can get z e C by calculating 

z = x + ro"-\ 

If there are still some unknown erasures, we enumerate all 
the possible values in X for the integer i. Usually, \X\ is small. 
For a specific i, it leads to a feasible solution z if 

(1) Given X — {i}, with the message-passing procedure 
above, all the erasures can be filled in. 

(2) x is balanced, namely, the numbers of ones and zeros 
are equal for the variable nodes. 

(3) Let z = x + 1*0™ \ Then i is the minimal integer in 
{0, 1, 2, n] subject to z + \ l Q n 1 is balanced. 

We say that a word y with erasures is uniquely decodable if 
and only if there exists i el that leads to a feasible solution, 
and for all such integers i they result in the unique solution 
z G C. The following simple example is provided for the 
purpose of demonstrating the decoding process. 

Example 3. Based on Fig. [77] we have a codeword x = 
01111000, which is transmitted over an erasure channel. We 
assume that the received word is y = 011110??. 
In the first round of the decoding, we have 

x (1) = 011110??, X = [0,8]. 

Considering the 2nd check node, we can update X as 

X= {0,1,4,5}. 

Considering the 3nd check node, we can continue updating 
X as 

J = lf|{l,2,6,7,8} = {l}. 

Based on (3), we can fill 0,0 for the 7th and 8th variable 
nodes. Finally, we get z = 11111000 and i = 1. 

Regarding to the decoding algorithm described above, there 
are two important issues that need to consider, including the 
decoding complexity of the algorithm and its performance. 



First, the decoding complexity of the algorithm strongly de- 
pends on the size of I when it finishes iterations. Fig. [12] 
simulates the average size of the inversion set X for decoding 
three balanced LDPC codes. It shows that when the crossover 
probability is lower than a threshold, the size of X is smaller 
than a constant with a very high probability. In this case, the 
decoding complexity of the balanced LDPC code is very close 
to the decoding complexity of the original unbalanced LDPC 
code. 

Another issue is about the performance of the decoding 
algorithm for balanced LDPC codes. In particular, we want 
to figure out the cost of additional redundancy in correcting 
the inversion of the first i bits when i is unknown. In Fig. [13] 
it presents the word error rate of balanced LDPC codes 
and the corresponding original unbalanced LDPC codes for 
different block lengths. It is interesting to see that as the block 
length increases, the balanced LDPC codes and the original 
unbalanced LDPC codes have almost the same performance, 
that is, the cost of correcting the inversion of the first i bits is 
ignorable. 

D. Decoding for Symmetric Channels 

In this subsection, we study and analyze the decoding 
of balanced LDPC codes for symmetric channels, including 
binary symmetric channels (BSC) and AWGN (Additive White 
Gaussian Noise) channels. Different from binary erasure chan- 
nels (BEC), here we are not able to determine a small set that 
definitely includes the integer i. Instead, we want to figure out 
the most possible values for i. Before presenting our decoding 
algorithm, we first introduce belief propagation algorithm for 
decoding LDPC codes. 

Belief propagation [13|, where messages are passed iter- 
atively across a factor graph, has been widely studied and 
recommended for the decoding of LDPC codes. In each 
iteration, each variable node passes messages (probabilities) 
to all the adjacent check nodes and then each check node 
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passes messages (beliefs) to all the adjacent variable nodes. 
Specifically, let m v J be the message passed from a variable 
node v to a check node c at the £th round of the algorithm, 
and let m!£j be the message from a check node c to a variable 
node v. At the first round, is the log-likelihood of the 

node v conditioned on its observed value, i.e., log p|^~°j 
for variable x and its observation y. This value is denoted by 
m^. Then the iterative update procedures can be described by 
the following equations 



m 



m, 



*v + Ec' 



c'£N(v)/c m c'v 



(t-\) 



1 = 0, 

(■ > 1, 



mW=2tanh- 1 ( JJ tanh(^)), 

v'£N(c)/v 

where N(v) is the set of check nodes that connect to variable 
node v and N(c) is the set of variable nodes that connect 
to check node c. In practice, the belief-propagation algorithm 
stops after a certain number of iterations or until the passed 
likelihoods are close to certainty. Typically, for a BSC with 
crossover probability p, the log-likelihood for each vari- 
able node v is a constant depending on p. Let x be the variable 
on v and let y be its observation, then 



log i-£ if y = 

-log^ 2 if y 



l. 



Let us consider the decoding of balanced LDPC codes. 
Assume x G C is a codeword of a balanced LDPC code, 
obtained by inverting the first i bits of a codeword z in a 
LDPC code C. The erroneous word received by the decoder 
is y € y n for an alphabet y. For example, y = {0, 1} for 
BSC channels, and y = R for AWGN channels. Here, we 
consider a symmetric channel, i.e., a channel for which there 
exists a permutation tt of the output alphabet y such that (1) 
Ti-- 1 = tt, and (2) P(y\l) = P(n(y)\0) for all y e y, where 
P(y\x) is the probability of observing y when the input bit is 
x. 

The biggest challenge of decoding a received word y G 
y n is lacking of the location information about where the 
inversion happens, i.e., the integer i. We let 

yW = Tr(y 1 )Tr(y 2 )...ir(y i )y l+1 ...y n , 

for all i G {0, 1, 2, ...,n — 1}. A simple idea is to search all 
the possibilities for the integer i from to n — 1, i.e, decoding 
all the words 



y (0) ,y (1 \ 



■•,y v 



separately. Assume their decoding outputs based on belief 
propagation are 

i(°),i( 1 ),..i( B ), 

then the final output of the decoder is i = z^ such that 
p(y(J) | z Ci)) is maximized. The drawback of this method is 
its high computational complexity, which is about n times the 
complexity of decoding the original unbalanced LDPC code. 
To reduce computational complexity, we want to estimate the 
value of i in a simpler and faster way, even sacrificing a little 
bit of performance on bit error rate. 



The idea is that when we are using belief propagation to de- 
code a group of words y(°\ y' n_1 ^, some information 
can be used to roughly compare their goodness, namely, their 
distances to the nearest codewords. To find such information, 
given each word yW (here, we denote it as y for simplicity), 
we run belief propagation for £ rounds (iterations), where £ is 
very small, e.g., £ = 2. There are several ways of estimating 
the goodness of y, and we introduce one of them as follows. 

Given a word y, we define 

A(y,^) = E II tanh(mW/2), 

ceCveN(c) 

where C is the set of all the variable nodes, N(c) is the set 
of neighbors of a check node c, and mi, c is the message 
passed from a variable node v to a check node c at the £th 
round of the belief -propagation algorithm. Roughly, A(y, £) is 
a measurement of the number of correct parity checks for the 
current assignment in belief propagation (after £—1 iterations). 
For instance, 

A(y,* = l) = a(r-2|ffy|), 

for a binary symmetric channel. In this expression, a is a 
constant, r = n — k is the number of redundancies, and -ffy 
is the number of ones in Hy, i.e., the number of unsatisfied 
parity checks. 

Generally, the bigger A(y^ , £) is, the more likely j = i is. 
So we can get the most likely i by calculating 

i = argmax A(y^, £). 

3=0 

Then we decode yW as the final output. However, the pro- 
cedure requires to calculate \(y^,£) with < j < n — 1. 
The following theorem shows that the task of computing all 
\(y(j\£) with < j < n — 1 can be finished in linear time 
if £ is a small constant. 

Theorem 3. The task of computing all A(y^' , £) with < j < 
n — 1 can be finished in linear time if £ is a small constant. 

Proof: First, we calculate \(y (0 \£). Based on the belief- 
propagation algorithm described above, it can be finished in 
0(n) time. In this step, we save all the messages including 
m v , iricv, m^c for all c G C, v G V and 1 < I < £. 

When we calculate A(y' 1 ^ , £), the only change on the inputs 
is m Vl , where v\ is the first variable node (the sign of m Vl is 
flipped). As a result, we do not have to calculate all m^, m^, 
ui^} for all c G C, v G V and 1 < I < £. Instead, we only 
need to update those messages that are related with m Vl . It 
needs to be noted that the number of messages related to m Vl 
has an exponential dependence on £, so the value of £ should 
be small. In this case, based on the calculation of \(y(°\£), 
X(y^ 1 \£) can be calculated in a constant time. Similarly, each 
of A(y^ , £) with 2 < j < n — 1 can be obtained iteratively 
in a constant time. 

Based on the process above, we can compute all X(y^>,£) 
with < j < n — 1 in 0(n) time. ■ 

To increase the success rate of decoding, we can also create 
a set of most likely values for i, denoted by l c . I c consists of 



12 



at most c local maximums with the highest values of A(y W , £). 
Here, we say that j e {0, 1, 2, 3, n — 1} is a local maximum 
if and only if 

A(y (j) ,^) > A(y(J- 1 )^),A(y«,£) > A(y(j +1 \£). 

Note that X\ = {i}, where i is the global maximum as 
defined above. If c > 1, for all j 6 I c , we decode ytu 
separately and choose the output with the maximum likelihood 
as the final output of the decoder. It is easy to see that the 
the above modified belief-propagation algorithm for balanced 
LDPC codes has asymptotically the same decoding complexity 
as the belief-propagation algorithm for LDPC codes, that is, 
0(n log n). 

In Fig. [14l it shows the performance of the above algo- 
rithm for decoding balanced LDPC codes under BSC and the 
performance of belief propagation algorithm for the original 
LDPC codes. From which, we see that when I = 2 and c = 4, 
the performance gap between balanced (280, 4, 7) LDPC code 
and unbalanced (280, 4, 7) LDPC code is very small. This 
comparison implies that the cost of correcting the inversion of 
the first i bits (when i is unknown) is small for LDPC codes. 

Let us go back the scheme of balanced modulation. The 
following examples give the log-likelihood of each variable 
node when the reading process is based on hard decision and 
soft decision, respectively. Based on them, we can apply the 
modified propagation algorithm in balanced modulation. 

Example 4. If the reading process is based on hard decision, 
then it results in a binary symmetric channel with crossover 
probability p. In this case, let y be the observation on a 
variable node v, the log-likelihood for v is 

j logi-£ ify = 0, 
m "-\-log^ ify = l. 

Example 5. If the reading process is based on soft decision, 
then we can approximate cell-level distributions by Gaus- 
sian distributions, which are characterized by 4 parameters 
Uq, o~o, Ui, (7i . These parameters can be obtained based on 
the cell-level vector y = c, following the steps in Subsection 
IV- CI In this case, if the input of the decoder is y, then the 
log-likelihood of the ith variable node v is 



data 



(cj—Up) 

2ai 



log — 



(Cj-Ut) 2 

2^ 



where C{ is the current level of the ith cell. If the input of the 
decoder is yW (we don't have to care about its exact value), 
then the log-likelihood of the ith variable node v is 



A, 

-A,: 



if i > j, 
if i < J, 



for all < i < n. 



VII. Partial-Balanced Modulation 

Constructing balanced error-correcting codes is more diffi- 
cult than constructing normal error-correcting codes. A ques- 
tion is: is it possible to design some schemes that achieve 
similar performances with balanced modulation and have 
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Fig. 15. Partial balanced code. 



simple error-correcting code constructions? With this moti- 
vation, we propose a variant of balanced modulation, called 
partial-balanced modulation. The main idea is to construct an 
error-correcting code whose codewords are partially balanced, 
namely, only a certain segment of each codeword is balanced. 
When reading information from a block, we adjust the reading 
threshold to make this segment of the resulting word being 
balanced or being approximately balanced. 

One way of constructing partial-balanced error-correcting 
codes is shown in Fig. Q3] Given an information vector u 
of k bits (k is even), according to Knuth's observation iPTOll . 
there exists an integer i with < i < k such that inverting 
the first i bits of u results in a balanced word u. Since our 
goal is to consttuct a codeword that is partially balanced, 
it is not necessary to present i in a balanced form. Now, 
we use i denote the binary representation of length [~log 2 fc] 
for i. To further correct potential errors, we consider [u, i] 
as the information part and add extra parity-check bits by 
applying a systematic error-correcting code, like BCH code, 
Reed-Solomon code, etc. As a result, we obtain a codeword 
x = [u, i, r] where r is the redundancy part. In this codeword, 
u is balanced, [i, r] is not balanced. 

Note that in most data-storage applications, the bit error rate 
of a block is usually very small. The application of modulation 
schemes can further reduce the bit error rate. Hence, the 
number of errors in real applications is usually much smaller 
than the block length. In this case, the total length of [i, r] is 
smaller or much smaller than the code dimension k. As the 
block length n becomes large, like one thousand, the reading 
threshold determined by partial-balanced modulation is almost 
the same as the one determined by balanced modulation. One 
assumption that we made is that all the cells in the same 
block have similar noise properties. To make this assumption 
being sound, we can reorder the bits in x = [u, i, r] such 
that the k cells of storing u is (approximately) randomly 
distributed among all the n cells. Compared to balanced 
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Fig. 14. World error rate of (280, 4, 7) LDPC codes with maximal 50 iterations. 



modulation, partial-balanced modulation can achieve almost 
the same performance, and its code construction is much easier 
(the constraints on the codewords are relaxed). In the following 
two examples, it compares the partial-balanced modulation 
scheme with the traditional one based on a fixed threshold. 

Example 6. Let us consider a nonvolatile memory with block 
length n = 255. To guarantee the data reliability, each block 
has to correct 18 errors if the reading process is based 
on a fixed reading threshold. Assume (255, 131) primitive 
BCH code is applied for correcting errors, then the data 
rate (defined by the ratio between the number of available 
information bits and the block length) is 

131 



255 



0.5137. 



Example 7. For the block discussed in the previous example, 
we assume that it only needs to correct 8 errors based 
on partial-balanced modulation. In this case, we can apply 
(255, 191) primitive BCH code for correcting errors, and the 
data rate is 

^=0.7176, 
255 

which is much higher than the one obtained in the previous 
example. 

The reading/decoding process of partial-balanced modu- 
lation is straightforward. First, the reading threshold Vb is 
adjusted such that among the cells corresponding to u there 
are ft/2 cells or approximately ft/2 cells with higher levels 



than Vb- Based on this reading threshold Vb, the whole block 
is read as a binary word y, which can be further decoded as 
[u, i] if the total number of errors is well bounded. Then we 
obtain the original message u by inverting the first i bits of 
u. 

VIII. Balanced Codes for Multi-Level Cells 

In order to maximize the storage capacity of nonvolatile 
memories, multi-level cells (MLCs) are used, where a cell of 
q discrete levels can store log 2 q bits |3|. Flash memories with 
4 and 8 levels have been used in products, and MLCs with 16 
levels have been demonstrated in prototypes. For PCMs, cells 
with 4 or more levels have been in development. 

The idea of balanced modulation and partial-balanced mod- 
ulation can be extended to multi-level cells. For instance, if 
each cell has 4 levels, we can construct a balanced code in 
which each codeword has the same number of 0s, Is, 2s, 
and 3s. When reading data from the block, we adjust three 
reading thresholds such that the resulting word also has the 
same number of 0s, Is, 2s, and 3s. The key question is how 
to construct balanced codes or partial-balanced codes for an 
alphabet size q > 2. 

A. Construction based on Rank 

A simple approach of constructing balanced codes for 
a nonbinary case is to consider the message as the rank 
of its codeword among all its permutations, based on the 



14 



lexicography order. If the message is u £ {0,l} fc , then the 
codeword length n is the minimum integer such that n = qm 

( QTt~L \ 

and ' ) > 2*0 The following examples are 

Vm m ... mj 
provided for demonstrating the encoding and decoding pro- 
cesses. 

Example 8. Assume the message is u = 1010010010 of length 

9 



10 and q = 3. Since 



> 2 



10 



we can convert u 



3 3 3y 

to a balanced word x of length 9 and alphabet size q = 3. 
Let S denote the set that consists of all the balanced words 
of length 9 and alphabet size q = 3. To map u into a word in 
S, we write u into the decimal form r = 658 and let r be the 
rank of x in S based on the lexicographical order. 

Let us consider the first symbol of x. In S, there are totally 
8 \ 

^ ^ J = 560 sequences starting with 0, or 1, or 2. Since 

560 < r < 560 + 560, the first symbol in x would be 1, then 
we update r as r — 560 = 98, which is the rank of x among 
all the sequences starting with 1. 

Let us consider the second symbol of x. There are totally 
8 \ 

I sequences starting with 10, and it is larger than 

^ £ o J 

r, so the second symbol of x is 0. 

Repeating this process, we can convert u into a balanced 
word x = 101202102. 

Example 9. We use the same notations as the above example. 
Given x = 101202102, it is easy to calculate its rank in S 
based on the lexicographical order (via enumerative source 
coding [5]j. It is 



2 3 3 
2 3 

1 1 

656, 



6 

1 2 3 
3 

1 2 



5 

1 1 3 
3 

1 2 



where I ^ is the number of x.'s permutations starting 

with 0, ^ is the number of x.' permutations starting 

with 100, ... 

Then from r, we can get its binary representation u 
1 1 00 1 00 1 (j. In HI 61/ . Ryabko and Matchikina showed that 
if the length of x is n, then we can get the message u in 
0(n log 3 n log log n) time. 

The above approach is simple and information efficient, but 
the encoding is not computationally fast. 



B. Generalizing Knuth's Construction 

An alternative approach is to generalize Knuth's idea to 
the nonbinary case due to its operational simplicity. Gen- 
erally, assume that we are provided a word u e G k q with 
G q = {0, 1, 2, q — 1} and k — qm, our goal is to generalize 
Knuth's idea to make u being balanced. 



Let us consider a simple case, q = 4. Given a word u £ G\, 
we let n, with < i < 3 denote the number of is in u. To 
balance all the cell levels, we first balance the total number 
of 0s and Is, such that no + ni = 2m. It also results in 
"-2 + ^3 = 2m. To do this, we can treat and 1 as an identical 
state and treat 2 and 3 as another identical state. Based on 
Knuth's idea, there always exists an integer i such that by 
operating on the first i symbols (0 — >• 2, 1 — > 3, 2 — > 0, 3 — >• 1) 
it yields no + n\ = 2m. We then consider the subsequence 
consisting of 0s and Is, whose length is 2m. By applying 
Knuth's idea, we can make this subsequence being balanced. 
Similarly, we can also balance the subsequence consisting of 
2s and 3s. Consequently, we convert any word in G\ into a 
balanced word. In order to decode this word, three additional 
integers of length at most [log fc] need to be stored, indicating 
the locations of having operations. The following example is 
constructed for the purpose of demonstrating this procedure. 

Example 10. Assume u = 0110230210110003, we convert it 
into a balanced word with the following steps: 

(1) By operating the first 4 symbols in u, it yields 
2332230210110003, where n + n x = 8. 

(2) Considering the subsequence of 0s and Is, i.e., the 
underlined part in 23322302 1011000 3. By operating the 
first bit of this subsequence (0 — > 1,1 — > 0), it yields 
23322312 1011000 3, where n a = n x = 4. 

(3) Considering the subsequence of 0s and Is, i.e., the 
underlined part in 233223 1210110003. By operating the first 
bit of this subsequence (2 — > 3, 3 — > 2), it yields 
233223 1210110003. which is balanced. 

To recover 0110230210110003 from 2332231210110003 
(the inverse process), we need to record the three integers 
[4, 1, 0] whose binary lengths are [log 2 16, log 2 8, log 2 8]. 

It can be observed that the procedure above can be easily 
generalized for any q = 2 a with a > 2. If m = 2 b with b > a, 
then the number of bits to store the integers (locations) is 



log 2 9-1 

E 

3=0 



V log, 



qm 



(q - l)ab - q(a - 2) - 2. 



For instance, if q — 2 3 = 8 and m = 2 7 = 128, then 
k = 1024 and it requires 137 bits to represent the locations. 
These bits can be stored in 46 cells without balancing. 

In fact, the above idea can be generalized for an arbitrary 
q > 2. For instance, when q = 3, given an binary word 
u 6 G\ m , there exists an integer i such that u + i«o 3m - 1 
has exactly m 0s or m Is. Without loss of generality, we 
assume that it has exactly m 0s, then we can further balance 
the subsequence consisting of Is and 2s. Finally, we can get a 
balanced word with alphabet size 3. More generally, we have 
the following result. 

Theorem 4. Given an alphabet size q — a/3 with two integers 
a and (3, we divide all the levels into j3 groups, denoted by 
{0,13,20,...}, {1,0+1,20 + 1,...}, .-, {(3- 1,2/3 -1,3/3- 
1,...}. Given any word u £ G q q m , there exists an integer i 
such that u + i l o 9m ~ l has exactly am symbols in one of the 
first /3 — 1 groups. 
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Proof: Let us denote all the groups as So, <Si, Sp-x. 
Given a sequence u, we use rij denote the number of symbols 
in u that belong to Sj. Furthermore, we let n'j denote the 
number of symbols in u+ l qm that belong to Sj. It is easy to 
see that = rij for all j G {0, 1, (3 — 1}, where (/3 — 
1) + 1 = 0. We prove that that there exists j G {0, 1, — 2} 
such that rij > am > n'j or rij < am < n'j by contradiction. 
Assume this statement is not true, then either min(nj, n'j) > 
am or max(iij,tij) < am for all j £ {0, 1, ...,/3 — 2}. So if 
rti > am, we can get rij > am for all j 6 {0, 1, ...,/3 — 1} 
iteratively. Similarly, if rti < am, we can get rij < am for 
all j £ {0, 1, j3 — 1} iteratively. Both cases contradict with 
the fact that Ylj=o n j = am P — 1 m - 

Note that the number of symbols in u+l l O qm ~ l that belong 
to Sj changes by at most 1 if we increase i by one. So if 
there exists j £ {0, 1, j3 — 2} such that rij > am > 
or rij < am < n'j, there always exists an integer i such that 
u+ l l 9TO ~' has exactly am symbols in Sj. 

This completes the proof. ■ 

Based on the above result, given any q, we can always split 
all the levels into two groups and make them being balanced 
(the number of symbols belonging to a group is proportional to 
the number of levels in that group). Then we can balance the 
levels in each group. Iteratively, all the levels will be balanced. 
In order to recover the original message, it requires roughly 

(q- l)log 2 <7log 2 m 

bits for storing additional information when m is large. If we 
store this additional information as a prefix using a shorter bal- 
anced code, then we get a generalized construction of Knuth's 
code. If we follow the steps in Section WU\ by further adding 
parity-check bits, then we get a partial-balanced code with 
error-correcting capability, based on which we can implement 
partial-balanced modulation for multiple-level cells. 

Now, if we have a code that uses 'full' sets of balanced 
codewords, then the redundancy is 

log 2 <r-log 2 ( qm )^ q -^lo g2 m 
\m,m,...,mj I 

bits. So given an alphabet size q, the redundancy of the 
above method is about 2l - q ~]"> log2 q times as high as that 
of codes that uses 'full' sets of balanced codewords. For 
<7 = 2,3, 4, 5, 10, we list these factors as follows: 

2.0000, 4.4803, 6.0000, 6.9361, 7.5694, 

8.0351, 8.4000, 8.6995, 8.9539. 

It shows that as q increases, the above method becomes less 
information efficient. How to construct balanced codes for a 
nonbinary alphabet in a simple, efficient and computationally 
fast way is still an open question. It is even more difficult 
to construct balanced error-correcting codes for nonbinary 
alphabets. 

IX. Conclusion 

In this paper, we introduced balanced modulation for read- 
ing/writing in nonvolatile memories. Based on the construction 



of balanced codes or balanced error-correcting codes, balanced 
modulation can minimize the effect of asymmetric noise, 
especially those introduced by cell-level drifts. Hence, it can 
significantly reduce the bit error rate in nonvolatile memo- 
ries. Compared to the other schemes, balanced modulation 
is easy to be implemented in the current memory systems 
and it does not require any assumptions about the cell-level 
distributions, which makes it very practical. Furthermore, we 
studied the construction of balanced error-correcting codes, in 
particular, balanced LDPC codes. It has very efficient encoding 
and decoding algorithms, and it is more efficient than prior 
construction of balanced error-correcting codes. 
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